Tuning Java Garbage Collection for Spark Applications

نویسنده

  • Jie Huang
چکیده

Spark is gaining wide industry adoption due to its superior performance, simple interfaces, and a rich library for analysis and calculation. Like many projects in the big data ecosystem, Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection (GC). New initiatives like Project Tungsten will simplify and optimize memory management in future Spark versions. But today, users who understand Java’s GC options and parameters can tune them to eek out the best the performance of their Spark applications. This article describes how to configure the JVM’s garbage collector for Spark, and gives actual use cases that explain how to tune GC in order to improve Spark’s performance. We look at key considerations when tuning GC, such as collection throughput and latency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tuning J2EE Application Servers

Ever since the introduction of the J2EE Enterprise Application specification from SUN Microsystems, this new technology has experienced tremendous growth. This paper will discuss tuning approaches for J2EE application servers. After covering tuning options at the architectural level of a J2EE application, we discuss the impact of garbage collection and the JVM on application server performance,...

متن کامل

Tuning Java’s Memory Manager for High Performance Server Applications

Java is a strong player in the application server market and thus the performance of its virtual machine is an important aspect of a server’s performance. One of the components that affect the performance of a JVM is the memory manager, which also includes the garbage collector. Modern virtual machines offer a multitude of options for tuning the memory manager, which can have a significant impa...

متن کامل

Trash Day: Coordinating Garbage Collection in Distributed Systems

Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a significant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications. In this paper, we show that distributed applications suffer from each node...

متن کامل

Adaptive Garbage Collection for Battery-Operated Environments

Energy is an important constraint for battery-operated embedded Java environments. In this work, we show how the garbage collector (GC) can be tuned to reduce the energy consumption of Java applications. In particular, we show the importance of tuning the frequency of invoking GC based on object allocation and garbage creation rates to optimize leakage energy consumption. We reduce the leakage ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015